Multiple sequencible and ligatible structures for genomic analysis

ABSTRACT

High throughput methods and kits for single nucleotide polymorphism (SNP) genotyping are provided. The methods involve utilizing nested PCR amplification reactions which produce sequencible and ligatible structures. An outer PCR primer set amplifies the SNP, and an inner PCR primer set amplifies a portion of the DNA amplified by the outer primer set, but does not amplify the SNP itself. The inner and outer primers may reaction include non-target common domain sequences, and the inner primer common domain sequences may comprise digestion restriction endonuclease recognition sites. The design of the inner primer set allows precise tailoring of the sequencible and ligatible structures with respect to length and base composition.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention generally relates to single nucleotide polymorphism (SNP)genotyping. In particular, the invention provides a method for SNPgenotyping based on a nested PCR design that creates structures directlysuitable for both DNA sequencing and ligation reactions.

2. Background of the Invention

With the completion of the rough draft of the human genome, over 1.5million non-redundant SNPs have been identified and mapped by thepublicly funded project and the SNP consortium (Sachidanandam et al,2001). The availability of such a large number of SNPs has promptedgreat interest in SNP related applications and research, such as largescale linkage, genomic association and pharmarcogenetic studies.

SNPs entered the central arena of genetics due to their abundance(Collins et al, 1997; Sachidanandam et al, 2001), stability(Sachidanandam et al, 2001) and the relative ease (Kwok 2000; Shi 2001;Syvanen 2002) with which they are genotyped. The targets of SNPgenotyping are normally small pieces of DNA, ranging from 150-300 basepairs. Based on the SNP consortium report, the average SNP density inthe human genome appears to be greater than 1 SNP/KB. This high densityprovides a tool to analyze genomic structure at very high resolution,and to establish a direct correlation between genetic coding variationand biological function. The relative stability of SNPs (compared tomini- and micro-satellite markers) in evolution makes it simpler tocarry out this task.

Although SNP genotyping is generally easier than microsatellitegenotyping, and despite the rapid and significant advances in SNPgenotyping technology in the last few years, major issues remainunsolved. Whereas only a few years ago, the main concern was the need tohave a large number of SNPs available, currently the most urgent issuesare increasing throughput and decreasing cost. For example, for largescale applications as projected by Kruglyak (1999) and Long (1999), areasonable cost would be <$0.10/genotype, or else large-scale projectswill become prohibitively expensive. To fulfill the goals ofunderstanding the genetics of complex traits and common diseases,cost-effective and higher throughput methodology in SNP genotyping isessential.

Typically, genotyping protocols have three components: targetamplification, allelic discrimination and product detection andidentification. They are normally executed sequentially, but can beprocessed in a single step reaction, depending on the means which areused for allele discrimination and signal detection. Such single stepprocedures are well-suited to automation, but are not necessarilyeffective in terms of cost and throughput.

PCR is the dominant procedure utilized for target amplification. As iswidely known, a PCR can readily amplify targets present in lower copynumber by a factor of 108 or more. With respect to allelediscrimination, all SNP genotyping technologies currently available arebased on mechanisms involving one or more of the following: DNApolymerase, hybridization and DNA ligase. DNA polymerase methods includesingle base extension (SBE) (Armstrong, 2000; Barta 2001; Bray 2001; Cai2001; Chen 1997; Chem 1999; Chen 1998; Chen 1997; Chen 1997; Fan 2000;Lindblad-Toh 2000; Nikiforov 1994; Pastinen 1996; Pastinen 2000; Ross,1998; Sauer 2000; Syvanen 1999; Ye 2001), de novo sequencing includingpyrosequencing (Nordstrom 2000; Ronaghi 2001), allele-specific PCR andextension (Germer 1999; Myakishev 2001; Pastinen 2000), andstructure-specific cleavage (Fors 2000). Of these different forms, SBEis the most widely used and has been adapted for many differentdetection platforms. Hybridization is also widely used in severaldifferent forms, including dynamic hybridization (Prince 2001), and isthe primary method currently used in all microarray detection formats.DNA ligase methods are based on the ability of DNA ligase to join theends of two oligonucleotides annealed next to each other on a template(Tong 2000; Tong1999). Two oligonucleotides can be designed to anneal toboth sides of a SNP site, and by detecting the formation of ligationproduct, the genotype of a target can be inferred (Chen 1998; Delahunty1996; Gerry 1999; Iannone 2000).

Once a target DNA sequence has been amplified and the allelic variantsdiscriminated, the next step in a generic genotyping protocol is todetect and identify the allele specific products. Detection mechanismsvary greatly, from simple fluorescence intensity (Armstrong 2000; Cai2000; Chen 1998; Chen 1997; Delahunty 1996; Dubertret 2001; Fan 2000;Fergusoon 2000; Fors 2000; Germer 1999; Germer 2000; Gerry 1999; Iannone2000; Lindblad-Toh 2000; Lindroos 2001; Marras 1999; Medintz 2001;Myakishev 2001; Nikiforov 1994; Pastinen 1997; Pastinen 2000; Prince2001; Syvanen 1999; Ye 2001) to very precise mass (Bray 2001; Ross 1998;Sauer 2000) or electric charge (Gilles 1999; Woolley 2000) measurement.The detection mechanisms roughly fall into two categories: homogeneousand solid phase mediated detection. All homogeneous detection platformsdepend on measuring fluorescence intensities and their change duringand/or after reactions. One common feature among homogeneous approachesis that they do not require any separation/purification prior to signalacquisition, making them amenable to automation. Solid supporttechniques include flow cytometry genotyping (Armstrong 2000; Cai 2000;Ye 2001), zip-code microarrays (Fan 2000; Gerry 1999), and massspectrometry genotyping (Bray 2001; Ross 1998; Sauer 2000). Using asolid support in the detection step may potentially increase throughputand reduce cost, but unfortunately also entails the risk of complicatingprotocols and compromising data quality. For example, when reactionmixtures are applied to a solid support, unintended binding offluorescence dye to the solid surface can occur, necessitating extensivewashing to minimize spurious signals.

With respect to cost, prices for genotyping technologies currentlyavailable on the market vary considerably, from $0.50-2.00/genotype. Forexample, the template-directed dye-terminator incorporation assay withfluorescence detection (FP-TDI) (Chen 1999) costs roughly $0.50/genotypefor reagents by list price. For other methods, due to their requirementfor special reagents and/or clean-up procedures, the cost is higher,e.g. MALDI-TOF mass spectrometry is about $0.75-1.00, whereaspyrosequencing approaches $2.00/genotype. No currently availabletechnology approaches the low cost which is necessary in order toprevent large-scale projects from becoming prohibitively expensive, i.e.<$0.10/genotype.

There are currently many genotyping applications in use or underdevelopment that require a relatively large number of both SNPs andsamples (e.g., both in the thousands or more), demanding both costeffective and high throughput technologies. Examples include genemapping studies by linkage or linkage disequilibrium (Kruglyak 1999;Long 1999) for complex traits and pharmacogenetics (Riley 2000).Clearly, these applications are critical to improving our understandingof the genetics of complex traits, common diseases and drug response,and to help attain the goal of individualized medicine. However, none ofthe current approaches are suitable for such applications in terms ofcost effectiveness coupled with high throughput potential, and SNPgenotyping demands both.

SUMMARY OF THE INVENTION

The present invention provides a methodology for high-throughput SNPgenotyping that is highly cost effective. This novel approach makes useof the high throughput capacity of DNA sequencers for SNP genotyping andis based on a nested PCR design that creates a series of ordered“structures” in parallel and exploits them for SNP genotyping assays.The invention thus provides novel genotyping technologies thatsignificantly increase throughput, reduce cost and are accessible formost researchers, leading to significant improvements in genotypingtechnology. While typical SNP genotyping approaches currently availablescore <10 SNPs simultaneously, the methods of the present inventionincrease the number to about 50-100 with obvious advantages forthroughput and cost.

The present invention provides a method of producing hybrid DNA with asingle strand overhang that includes a target sequence. The steps of themethod include: obtaining a first primer which hybridizes to a 5′ strandof a strand of deoxyribonucleic acid (DNA), a second primer whichhybridizes to a 3′ strand of the strand of DNA, and a third primer whichhydridizes to said 3′ strand of DNA; producing by nested polymerasechain reaction (PCR) using the first primer, the second primer, and thethird primer, an outer amplicon which includes a target sequence and aninner amplicon which excludes the target sequence; forming at least oneof the following: a ligatable structure which includes a 3′-5′ sequencewhich excludes the target sequence hybridized to a 5′-3′ sequence whichincludes the target sequence, and a sequencible structure which includesa 5′-3′ sequence which excludes the target sequence hybridized to a3′-5′ sequence which includes the target sequence; sequencing thesequencible structure(s) and ligating the ligatible structure(s) with alabeled oligonucleotide; and analyzing the sequencing products andligation products with a DNA sequencer to determine the genotype of saidindividual. The method may form both the ligatible structure and thesequencible structure. The target sequence(s) may include a singlenucleotide polymorphism.

The invention also provides a method of genotyping the deoxyribonucleicacid (DNA) of an individual by analyzing at least on target sequence inthe DNA. The steps of the method include: obtaining a first primer whichhybridizes to a 5′ strand of a strand of said DNA, a second primer whichhybridizes to a 3′ strand of the strand of DNA, and a third primer whichhydridizes to the 3′ strand of DNA; producing by nested polymerase chainreaction (PCR) using the first primer, the second primer, and the thirdprimer, an outer amplicon which includes said target sequence and aninner amplicon which excludes the target sequence; and forming at leastone of the following: a ligatable structure which includes a 3′-5′sequence which excludes the target sequence hybridized to a 5′-3′sequence which includes the target sequence, and a sequencible structurewhich includes a 5′-3′ sequence which excludes the target sequencehybridized to a 3′-5′ sequence which includes the target sequence;sequencing the sequencible structure and ligating the ligatiblestructure with a labeled oligonucleotide; and analyzing the sequencingand ligation products so produced with a DNA sequencer to determine thegenotype of the individual. The forming step of the method may form boththe ligatible structure and the sequencible structure. The targetsequence(s) may include a single nucleotide polymorphism. Thesequencible structure may be sequenced by a technique such as dideoxysequencing, pyrosequencing, and single base extension.

In a preferred embodiment, of the method, i) a plurality of targetsequences is analyzed; ii) the step of forming forms an ordered seriesof sequencible structures; iii) the step of sequencing produces anordered series of sequencing products of varying, non-overlappinglength; and iv) the step of analyzing is carried out by electrophoresingthe ordered series of sequencing products in a single channel of saidDNA sequencer. In some embodiments, the step of sequencing is carriedout by single base extension (SBE). In other embodiments, the step ofsequencing is carried out by a dideoxy sequencing reaction utilizing aratio of dNTPs to ddNTPs that is lower than that which is typically usedin order to produce a short sequence reading.

In a preferred embodiment of the present invention, the labeledoligonucleotide is fluorescently labeled.

In one embodiment of the method, i) a plurality of target sequences isanalyzed; ii) the labeled oligonucleotides are degenerate; iii) theligation products are of varying, non-overlapping lengths; and iv) thestep of analyzing is carried out by electrophoresing a plurality ofligation products in a single channel of said DNA sequencer.

The present invention also provides a method for analyzing at least onetarget site in a DNA molecule. The method includes the steps of: 1)amplifying the target site(s) by nested PCR; (The nested PCR is carriedout using inner and outer PCR primer pairs, wherein the outer PCR primerpair forms a first PCR product which contains a target site, and theinner PCR primer pair forms a second PCR product which contains aportion of the first PCR product but does not contain the target site);2) denaturing the first and said second PCR products to form ssDNAsequences, 3) reannealing the ss DNA sequences to form a sequenciblehybrid DNA molecule and a ligatible hybrid DNA molecule, 4) performingsequencing reactions with said sequencible hybrid DNA molecule andligation reactions with said ligatible hybrid DNA molecule, and 5)determining the characteristics of the target sequence by analyzingresults obtained in the performing step. In a preferred embodiment ofthe method, the target site(s) may be SNP polymorphism sites. Further,the inner and outer primer pairs may comprise a sequence tag, which inturn may comprises a restriction enzyme recognition site. In someembodiments of the method, the step of amplifying may be carried outusing a low concentration of primers, and may further comprising asecond step of amplification using secondary primers for amplificationof the sequence tags. According to the method, one or several targetsites may be analyzed. The step of amplifying may be carried out in asingle multiplex PCR reaction, or in multiple independent PCR reactions,and the results may be analyzed by a DNA sequencer.

The invention also provides inner and outer PCR primer pairs for theamplification of a target sequence in a DNA molecule, wherein the outerPCR primer pair forms a first PCR product which contains the targetsequence, and the inner PCR primer pair forms a second PCR product whichcontains a portion of the first PCR product but does not contain saidtarget sequence. The target sequence may be an SNP polymorphism site.The primer pairs may comprise a sequence tag.

The present invention also provides a kit for amplification of at leastone target sequence in a DNA molecule. The kit includes inner and outerPCR primer pairs for the amplification of the target sequence. The outerprimer pair amplifies a portion of the DNA molecule which includes thetarget sequence, and the inner primer pair amplifies part of the sameportion of the DNA molecule that is amplified by the outer primer pair,but excludes the target sequence. The inner and outer PCR primer pairsmay comprise sequence tags. The kit may further comprise secondaryprimers for the amplification of the sequence tags.

The present invention also provides a dideoxy DNA sequencing kit forproducing short chain termination fragments. The kit includes dNTPs andddNTPs which are present in a low dNTP:ddNTP ratio.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. A schematic drawing illustrates a nested PCR design that wouldcreate both a sequencible and a ligatible structure. The reaction usestwo primer sets (inner and outer). P1, forward outer and forward innerprimer; P2, reverse inner primer; P3, reverse outer primer. When thisdesign is multiplexed, multiple sequencible and a ligatible structuresare formed. These structures can be used for genotyping reactions.

FIGS. 2A and B. A, a multiplex PCR scheme that amplifies three targetsites. Two primer sets (inner and outer) are used for the amplificationof each target site. The target sites are represented by “x”. Allprimers contain two domains, a common domain and a target specificdomain. The target specific domains, indicated by arrows, have sequencesunique to the targets, whereas the common domains are not targetspecific.

=common domain of forward inner and forward outer primers; ▮=commondomain of inner reverse primers, - - - =common domain of reverse outerprimers. B, close-up view of common domain of inner primer showsrestriction sites incorporated into the sequence of the domain. FIGS. 3Aand B. A. Depiction of nested PCR primer design. The sequence (SEQ IDNO. 1) represents 393 nucleotides from human chromosome 6. Nucleotidescorresponding to primer pairs are shaded with gray, and arrows indicateprimer orientation. Arrow indicates site of polymorphism. The sequencewhich was generated by ABI 377 sequencer is shown in italics, and isalso depicted in FIG. 3B (SEQ ID NO. 2). The sample is a A/A homozygote.

FIGS. 4A and B. A. Depiction of nested PCR primer design. The sequence(SEQ ID NO. 3) represents 494 nucleotides from human chromosome 6.Nucleotides corresponding to primer pairs are shaded with gray, andarrows indicate primer orientation. Arrow indicates site ofpolymorphism. The sequence which was generated by ABI 377 sequencer isshown in italics, and is also depicted in FIG. 4B (SEQ ID NO. 4). Twosamples were shown in FIG. 4B. The upper panel was a heterozygote (C/T),at base position 24, as pointed by the arrow, both bases © and T) wereidentified. The lower panel was a homozygote (T/T), at the indicatedposition only the T base was identified. The example demonstrated that asingle nucleotide polymorphism (SNP) could be identified by sequencingfrom the sequencible structure formed from the nPCR design.

FIG. 5. SBE genotyping using sequencible structures from nPCR productsamplified by Pfu DNA polymerases. Nested PCR primers were designed foran SNP marker from human chromosome 6. The inner product size is 189 bp,its 3′ end is positioned immediately upstream to the polymorphic sitewhen it forms sequencible structures. PCRs were performed with DNAsamples of known genotypes. After PCRs the outer and inner products werepurified and denatured together to form sequencible structures. SBEreactions were performed and products analyzed in ABI 377 Sequencerusing Genescan. Three samples were presented. The upper panel is ahomozygous A/A where an SBE product of 190 bp is found (arrow). Productsof the same size are found in the heterozygous (middle panel) and theother homozygous (lower panel). The peaks after the dye front are ROX500size makers (35, 50, 75, 100, 139, 150, 160, 200, 250 bp respectively).

FIG. 6. An example illustrates a short sequence read with a speciallyassembled sequencing mixture that has a different ddNTPs/dNTPs ratio.The components used for the example were the following: 1.75 μM of −21M13 primer, 300 ng of pGEM, 60 μM of each dNTPs, 2.5 μM ofR6G-acyclo-ATP, 12 μM of ROX-acyclo-CTP, 10 μM ofBODIPY-Fluorescein-acyclo-GTP, 25 μM of TAMRA-acyclo-UTP, 1 unit ofAcycloPol DNA polymerase and 1× AcycloPol reaction buffer. The thermalcycling conditions were 96° C. for 3 min., followed by 25 cycles of 96°C. for 1 min., 50° C. for 15 sec. and 60° C. for 4 min.

FIG. 7. Primer designs affected multiplex PCR and nPCR. Lane 1 was a4×multiplex PCR with simple PCR design, Lane 2 was the same 4 ampliconsbut with the two domain design, and Lane 3 was the 4× multiplex nPCRwith the same amplicons performed in two sequential reactions. In thefirst reaction, equal amount of primary primers at 10 nM were used. Thecycling condition was 95° C. for 10 min, followed by 10 cycles of 95° C.for 45 sec., 65° C. for 5 sec., ramping to 55° C. at 0.1° C./sec and 55°C. for 90 sec. For the second reaction the primers used were differentamong the three lanes. Lane 1, since it did not have secondary primers,the same primers were used as in first reaction but the concentrationincreased to 100 nM. Lane 2 used two secondary primers, M13 forward andreverse, each 400 nM. Lane 3 used three primers, M13 forward (600 nM),M13 reverse (400 nM) and the common tag for inner PCR primers (400 nM).The cycling conditions for the second reaction was following: 95° C. for10 min, followed by 30 cycles of 95° C. for 45 sec., 55° C. for 90 sec.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS OF THE INVENTION

The present invention provides methods of high-throughput SNP genotypingthat are highly cost-effective. The methods utilize the high-throughputcapability of ubiquitous and readily accessible DNA sequencers in orderto carry out the SNP analysis. This is possible due to the experimentaldesign of the methods, which are based on nested PCR reactions. In apreferred embodiment, the target sites which are analyzed are SNPpolymorphism sites.

However, those of skill in the art will recognize that other targetsites may also be analyzed by the methods of the present invention. Forexample, short insertions and deletions, and closely spaced multipleSNPs can be identified by the multiple target sequencing approachesdisclosed herein.

As is well-known to those of skill in the art, nested PCR is a PCRamplification method that “nests” one PCR reaction within another. Twosets of primers are used: “outer” primers amplify a portion of a DNAstrand, and “inner” primers amplify a smaller portion of the same DNAthat is located within the larger, outer portion. Thus, two PCR products(amplicons) are produced. One amplicon is smaller than the other andcontains a “subset” of the bases contained in the larger, outeramplicon.

In the practice of the present invention, a new design for nested PCR isused. Primers are designed so that the outer amplicon includes a targetsite (e.g. an SNP polymorphism site) and the inner amplicon excludes thetarget site. The purpose of the inner PCR in the nested reaction is tosynthesize a DNA fragment whose size and position relative to the outerPCR can be precisely controlled. This design concept is illustrated inFIG. 1. As can be seen, in FIG. 1, the inner primer set is composed ofprimers P1 and P2, and the outer primer set is composed of primers P1and P3. (In this example, Primer P1 is utilized in both primer sets.)The P1-P3 outer primer set flanks the indicated target site whereas theP1-P2 inner primer pair does not. PCR amplification of the DNA withthese primers will produce two amplicons. The larger of the two is theouter amplicon formed by amplification with the P1-P3 pair; thisamplicon includes the target site. The smaller of the two amplicons, theinner amplicon, contains base pairs which lie within the bounds of thelarger outer amplicon (i.e. is “nested” within the larger amplicon), butdoes not contain the target site. The ultimate product of the PCRreaction is double strand DNA of two lengths, only one of which (thelonger) contains the target site, and the shorter of which contains asubset of the bases contained in the longer strand.

In a preferred embodiment of the invention, the same primer functions asboth the outer and inner forward primer. However, those of skill in theart will recognize that this need not be the case. With reference toFIG. 1, a distinct forward inner primer (“P4”, shown in parentheses)might be designed to anneal between P1 and P2 and in the orientation ofprimer P1. Because the shorter amplicon contains bases which areidentical to a portion of the longer amplicon, when the ds strands aredenatured and allowed to reanneal, some (approximately 50%) of theshorter single strand DNA of the inner amplicon will hybridize withmatching portions of the single strand DNA of the outer amplicon. Thiscross hybridization between the two amplicons creates two differentinner-outer “hybrid” structures as depicted in FIG. 1. In one of the twostructures, the “ligatible structure” of FIG. 1, a 3′ overhang will bepresent, the bases of which originated from the outer amplicon andcontain the target site. This structure can be used, for example, forligation based applications. By “ligation based applications” we meanthe smaller amplicon is one of the two or more pieces of DNA that arejoined together by a DNA/RNA ligase. In the other structure, the“sequencible structure” of FIG. 1, a 5′ overhang will be present, thebases of which originated from the outer amplicon and also contain thetarget site. This second type of structure will be directly amenable tosequencing applications. By “sequencing applications” we mean that thesmaller amplicon acts as a sequencing primer to sequence the DNAdownstream from it. The sequencing can be only one base in length (i.e.SBE) or in the tens or in the hundreds of bases. Thus, in a singlenested PCR reaction followed by denaturation and annealing, it ispossible to generate both sequencible and ligatible structures. Ofcourse, some (approximately 50%) of the single strand DNA will renatureto form the original ds PCR products (“structures same as products” inFIG. 1).

Years of improvement in primer design and thermocycling conditions haveincreased the success rate for single target PCR reactions to greaterthan 90%. However, multiplex reactions such as those in a nPCR reaction,have a much lower success rate and extensive optimization is typicallyrequired, usually because it is very difficult to obtain amplifiedproducts in equal amounts. The usual solution is to increase the primerconcentration for the weakly amplified target(s). This suggests that itis annealing efficiency that is responsible for the unequalamplification. In U.S. Pat. No. 5,882,856 to Shuber, the completecontents of which is herein incorporated by reference, an alternativesolution is proposed in which all primers utilized have two domains, onewhich is target specific as in regular PCR designs, and one which is a“common domain”. For a set of PCR primers to be multiplexed, two commondomains are utilized. A first common domain is a sequence shared by allforward primers and is attached at the 5′ end of the target specificdomain. A second common domain sequence is shared by all reverse primersand is attached at the 5′ end of the target specific domain. The commondomain facilitates annealing of different primer sets because the commondomain has a higher annealing temperature than the target specificdomain.

In the practice of the present invention, common sequence domains or“sequence tags” as taught by Shuber may be utilized for both innerand/or outer PCR primers. The common sequence tag for the outer PCRprimers would help in multiplexing PCR so that amplification wouldresult in the production of equal amounts of product. The commonsequence tag for the inner primers also serves to even the levels of PCRproducts, and in addition may provide restriction sites that can be usedto generate precise fragment ends for genotyping reactions such assequencing and/or ligation reactions. The combination of incorporatingspecial restriction enzyme recognition sequences in such a sequence tag,and the deliberate positioning of the inner PCR primers make it possibleto precisely tailor the resulting PCR products for any of a variety ofpurposes, including but not limited to primer extension, sequencing,ligation, pyrosequencing, structure specific cleavage, and othergenotyping reactions.

Those of skill in the art will recognize that placement of the P2 primercan be anywhere between the target site and the other primer of theinner primer pair, so long as a useful PCR product is produced by PCRamplification of the inner primer pair, and the target site itself isnot amplified. The positioning of the P2 primer relative to the targetSNP may vary depending on the mechanism of the genotyping reactions. Forexample, for SBE reactions, the end of the primer should be immediately5′ to the SNP site. For Multi-Target Sequencing (NulTarSeq) reactions,there may be a gap between the end and the target SNP site. In apreferred embodiment, for MulTarSeq reactions the gap may be from about1 to 50 bases, or more preferably from about 5 to about 20 bases, andmost preferably from about 10 to about 15 bases. Further, as describedabove, the desired length of the final product from the amplification ofPCR primers containing common sequence tags may also be determined bythe presence of a restriction enzyme cleavage site within the tag.

In yet another embodiment of the present invention, the amplificationreactions may be carried out in a step-wise fashion using primary andsecondary primers. In this embodiment, a single PCR is divided into twosequential reactions, the first of which uses the primary primer set andthe second of which uses the secondary primer set. The primary primersmay be designed to have two domains (target specific and common) asdescribed above. The secondary primer set contains only the common tailsfrom the primary primers.

For the first reaction, the primary primers are used in only limited butequal amounts, e.g. about 1-5% of the amount for a regular PCR i.e.about 0.1 to about 100 nM. The rationale is that by limiting the amountof primers for each set multiplexed in the reaction, the system isforced to produce limited but equal amounts of products for allamplicons. When the more robust primers are used up (because of thelimited amount available) the system will work with the less efficientprimers. The purpose of the primary PCR is to amplify enough templatesfrom the genomic DNA to carry out the secondary PCR. Since all primaryprimers have limited but equal amounts, this forces the primary PCR toproduce limited but near equal amounts of templates for the secondaryPCR. For the secondary PCR, the secondary primers, consisting of onlyone set for all primary amplicons, is used. In essence this secondaryreaction is a single PCR that amplifies multiple templates. The netresult of this two-step PCR process is to compensate for differences inefficiency across PCR reactions.

The purpose of the inner PCR in the nested reaction is to synthesize aDNA fragment whose size and position relative to the outer PCR can beprecisely controlled. By altering the position of the P2 primer relativeto P1, and by optionally including sequence tags with restriction sites,it is possible to design longer or shorter PCR fragments. Thesynthesized fragments can then be used in any of a variety of genotypingreactions, including but not limited to primer extension, sequencing,ligation, pyrosequencing, structure specific cleavage, and the like.

For example, the sequencible structure fragments may be used asextension primers in SBE and MulTarSeq reactions. The direct productionof sequencible structures by the methods of the present invention thuspermits investigators to take advantage of the ready availability of DNAsequencers for the purposes of genotyping. Relatively long primers canbe synthesized in this manner in a feasible and much more cost effectiveway than, as is typically done, by chemical synthesis. For example, thelength of such a fragment can range from about 35 to about 1000 bps, andis preferably in the range of about 50 to about 650 bps. To use astandard sequencing reaction to determine the sequences surrounding apolymorphic site, it is simply necessary to add an appropriatesequencing mixture without using any sequencing primer, because thesequencible structure provides the free 3′ end for the sequencereaction.

Those of skill in the art will recognize that several methods ofsequencing are well established, and that new sequencing methodologiesare continually being developed. In the practice of the presentinvention, any suitable method of sequencing may be utilized. Examplesof such methods include but are not limited to Sanger (dideoxy)-basedsequencing methods and pyrosequencing.

The procedure illustrated in FIG. 1 amplifies a single target and relieson multiplex PCR amplification since more than one pair of primers isused in a single reaction. However, those of skill in the art willrecognize that in many cases it will be desirable to amplify more thanone target site at a time. In this case, the design of the inner primerpairs for each target site may be such that an ordered series of PCRproducts of varying lengths is produced, i.e. the sequencible structureformed for each target site differs in length from the sequenciblestructures formed for each of the other target sites. (See FIG. 2A). Forexample, an ordered series of sequencible structures from about 50 toabout 500 bps in length can be designed, thus adding an additionaldistinguishing feature to each of the amplified target sites. Because ofthis feature, multiple sets of PCR products can be sequenced in a singlereaction providing that the length of sequencing is limited so thattheir chain termination fragments do not overlap in a sequencing gel.This would allow the sequencing of multiple targets in a single reactionand analysis in a single channel (e.g. a lane or capillary).

Those of skill in the art will recognize that several ways exist tocontrol the length of reading for sequencing reactions. For example, thelength of reading can be adjusted by changing the ratio of ddNTP/dNTP inthe reaction mixture. For standard Sanger sequencing reactions, theratio is typically about 1:100, a ratio intended to promote relativelylong reads (e.g. about 600-800 bps). Those of skill in the art willrecognize that, by systematically altering and testing the ratio, it ispossible to find the optimal reading length for a given application. Formost SNP genotyping purposes, a read of from about 1 to about 50, andpreferably from about 1 to about 25 bases would be sufficient, requiringa low ddNTP/dNTP ratio of preferably in the range of about 1:0 to about1:10, and more preferably about 1:0 to about 1:5. For otherapplications, the optimal length can be attained by adjusting the ratio,i.e. the ratio may be optimized by methods which are well known to thoseof skill in the art in order to achieve the desired sequence readlength. Precise control of the read allows very high level multiplexesto be carried out (e.g. 100 to 200×). In order to carry out the practiceof the present invention, primer design involving placement of theprimer and the optional addition of a sequence tag must be carried out.

Further, those of skill in the art will recognize that other factorsinfluence primer design and must be taken into account as well. Forexample, the exact sequence of bases will influence the strength of thehybridization of the primer with its cognate DNA (the sequences of basedto which it binds during PCR) and thus must be taken into account whenplanning PCR amplification protocols; the possibility of primer-dimerformation is taken into account, as is the potential for the formationof secondary structure, etc. Those of skill in the art are wellacquainted with such issues and numerous resources exist to aid theskilled practitioner in these various aspects of primer design.

As discussed above, the common domains of the inner primers may bedesigned to include restriction enzyme recognition sites. Therestriction enzymes preferred are those that cleave DNA strandsdownstream from their recognition sites and generate blunt endsproducts. For sequencing applications restriction enzymes that cleaveDNA downstream and generate 3′ overhangs can also be used. For ligationapplications restriction enzymes that cleave DNA downstream and generate5′ overhangs can also be used. Examples of such sites include but arenot limited to these enzymes: Mly I, Fst F51, Bpm I, Eco571, Bsg I, EciI, Bst71 I, Alw26 I, Ksp62 I, BceA I, etc. The other nucleotides in thesequence of a common domain inner primer are selected to balance themelting temperature with that of the common domain of the outer primers.With respect to the design of common domain sequences for outer primers(i.e. those which do not contain restriction sites), the sequences areselected so that so it will match the melting temperature of the commondomain of the inner primer.

In general, the generation of the sequencible and ligatible structuresof the present invention will be carried out as follows: a PCRamplification reaction is carried out, followed by a restriction with asuitable restriction enzyme if a restriction site was incorporated intoa common sequence tag of the inner primers. This is followed bytreatment to “clean up” the reaction e.g. with shrimp alkalinephosphatase (SAP) and exonuclease I (Exo I) to degrade excess PCRprimers and dNTP left over from the PCR reaction, and to remove theoverhanging ends of restriction digestion. (Alternatively, filtrationvia sephadex gel may also be employed.) After SAP/Exo I digestion, ahigh temperature denature step is performed. This denature step servestwo purposes: one is to inactivate both SAP, Exo I and the restrictionenzyme; the other is to completely denature double stranded DNA. Duringthe cooling process that follows, the four single stranded DNA fragmentsfrom the two PCR products will form four kinds of structure. Two of themare the exactly same as the PCR products and are both blunt ended. Twonew structures formed between the complementary strands of the large andsmall PCR products, one of them with a 5′ end single stranded, the otherone with a 3′ end single stranded. The one with a 5′ end single strandedprovides a free 3′ end for sequencing reaction/primer extension and forligation, and is referred to as a “sequencible structure”. The one withsingle stranded 3′ end provides a free 5′ end for ligation, and isreferred to as a “ligatible structure”. These four structures will haveequal concentrations if the quantity of the two PCR products is equal.In this case, the sequencible and ligatible structures will representabout 50% of the reannealed products. This amount is sufficient for bothsequencing/primer extension and ligation reactions to make new products.

In some embodiments of the present invention, a single target site isanalyzed by the methods of the present invention, and the PCR step iscarried out by multiplex PCR, i.e. amplification with both the inner andouter primers is carried out in the same reaction. However, those ofskill in the art will recognize that two separate PCR reactions may alsobe carried out, i.e. the inner and outer PCR reactions are carried outseparately. In this case, the PCR products from the two reactions may becombined later at some point prior to the renaturation step so that thesingle strands can reanneal.

Similarly, in other embodiments of the present invention, it may bedesirable to PCR amplify more than one target site at a time. In somecases, this may be carried out by a multiplex PCR reaction in whichinner and outer primers for all target sites of interest are amplifiedsimultaneously in a single reaction. Alternatively, one or more sitesmay be amplified together, or each site may be amplified alone, eitherin a single reaction with both inner and outer primers, or in tworeactions, one for the inner and one for the outer primers as describedabove. Any combination of PCR amplification reactions may be utilized inthe practice of the present invention, so long as the procedure resultsin the production of suitable sequencible and/or ligatible structures.

In a preferred embodiment, the present invention provides methods whichultimately utilize DNA sequencers or the like to detect the productsformed from sequencible and/or ligatible structures created by thepresent methods. Those of skill in the art will recognize that manysuitable techniques exist by which the sequencible products can beproduced for analysis by a DNA sequencer. Examples include but are notlimited to Sanger-type sequencing reactions, single base extension(SBE), and pyrosequencing. Ligatible structures may be analyzed directlyon a DNA sequencer after a labeled oligonucleotide (e.g. fluorescentlylabeled) is joined to the 5′ phosphate of the ligatible structure via aligation reaction.

Those of skill in the art are well acquainted with Sanger-typesequencing reactions, which are based on the interruption of enzymaticextension of the DNA chain at the 3′ hydroxyl via incorporation of2′-3′-dideoxy analogs. Four sets (one for each dideoxy analog) of suchchain-terminated fragments of differing lengths are generated and “read”(detected) by a DNA sequencer to yield a sequence. Typically, thefragments are labeled with radioactivity or fluorescent tags fordetection.

Single base extension is essentially a sequencing reaction, thedifference being that SBE uses only dye-terminators, resulting in theextension of the primer by only a single base. As described above, inthe practice of the present invention, the length and position of theextension primer relative to the polymorphic site may be controlled sothat it is possible to create a primer which anneals immediatelyupstream of a polymorphic site. Then, in a regular SBE reaction, thepolymorphic base can be determined by identifying which base(s) isincorporated into the primer. If several target sites are to beanalyzed, a set of primers can be designed which, upon amplification ofthe targeted DNA, produce an ordered set of such sequencible structuresof varying, non-overlapping lengths. When such structures are extendedby dye-labeled ddNTPs, an ordered series of SBE products of varying,non-overlapping lengths is produced. In the proposed nested PCR design,because extension primers from about 50 to over 500 bases can beobtained, it is possible to separate these SBE products in a sequencinggel/capillary. A regular sequencing gel can sequence about 500-800bases. Therefore, over 100 SNPs can be analyzed in a single channel byloading their corresponding SBE products in one lane/capillary when theextension primers are designed to be positioned, for example, every 4-5bases apart for a full range of 500-600 bases.

In order to successfully carry out SBE reactions, the ends of theprimers must be precise, i.e. it is necessary to control precisely wherethe inner amplicon ends. The use of Taq DNA polymerase may in some casesbe disadvantageous due to its known extendase activity which interfereswith such precision. Those of skill in the art will recognize that avariety of means are available to overcome such a problem.

For example, the use of high fidelity polymerases may be employed. PfuDNA polymerase is a well characterized DNA polymerase that has beenshown to have high fidelity of replication (Cline 1996). It is knownthat Pfu DNA polymerase does not add extra A's at the 3′ end (Cline1996; Hu 1993). Based on a study by Cline et al (Cline 1996) the use ofPfu DNA polymerase gives precise ends of inner PCR products.

A second alternative is to use ribonucleotides and/or abasic analogs inthe inner primer to control DNA polymerases' end point. Some DNApolymerases do not have reverse transcriptase activity, Pfu and Ventpolymerases are examples (Perler 1996). These enzymes can notincorporate any nucleotide when they encounter a ribonucleotide base ina template. Coljee et al recently reported a strategy to create preciseoverhang at the ends of PCR products by using ribonucleotides (Coljee2000). The strategy inspired us to use same mechanism to control theends of inner PCR products. Thus, inner primers may be designed tooverlap the target SNP site, at the SNP site, instead of using regulardeoxyribonucleotide, a ribonucleotide is used. DNA polymerases can use achimeric oligonucleotide as primer efficiently (Gal 2000; Stump 1999),so the use of chimeric primers would not compromise PCR efficiency. Whenthe complementary strands are synthesized, the DNA polymerases will stopone base before the ribonucleotide base in the chimeric inner primers,creating very accurate and precise ends for SBE. It is also possible tocreate chimeric primers containing several ribonucleotide bases tocreate several base overhang for the purpose of ligation (Coljee 2000).Similarly, a nucleotide analog that does not contain the base, theabasic analog, has similar effects on DNA polymerase (Gal 2000; Gal1999) and may be used.

Yet another alternative is to use restriction enzymes to produce precise3′ ends for sequencible structures. There are some restriction enzymesthat cleave DNA strands downstream from their recognition sites, e.g.KSP632 I, Alw26 I, and Bst71 I. As discussed above, to take advantage ofthese features, a common domain (sequence tag) may be added to the 5′end of the inner PCR primers. The common domains contain restrictionenzyme sites which can be positioned so that the 3′ extending ends arelocated just before the polymorphic sites after the tail is cleaved off.The use of a common domain for inner primers could also help to even theamounts among different amplicons when multiplexed.

Pyrosequencing is a unique technique that incorporates a sequencingreaction and real time detection together. Since, as in a regularsequencing reaction, pyrosequencing uses DNA polymerase, the free 3′termini in the sequencible structures can function as extension primers.Since pyrosequencing is a real time system, it is not amenable to theanalysis of multiplex reactions. There would thus be no need to use acommon tail for the inner PCR primers. The primers for a reaction whoseproducts are destined for analysis by pyrosequencing are designed toaccommodate the needs of pyrosequencing, preferably from about 5 toabout 8 bases upstream of the target polymorphic site.

The ligatible structures produced by the methods of the presentinvention have a free 5′ phosphate. If a common tag is included in thedesign, when the common tag is removed from the inner PCR products,restriction enzyme digestion leaves the 5′ phosphate on the ligatiblestructure. To make such a ligation site at a polymorphic site, thecleavage site of restriction enzymes used in the inner PCR primer mustbe immediately down stream of the polymorphic site on the complementarystrand. Those of skill in the art will recognize that, because all ofthese restriction enzymes produce a 5′ over hanging strand, a suitabledesign of restriction site for ligation assay would not be suitable forSBE reaction, but could be used for MulTarSeq reaction. Further, oneadvantage of the use of ligation to score a polymorphism is that it canbe combined directly with the PCR reaction; resulting in a closed tubeassay.

To be detected by a DNA sequencer, DNA fragments have to be labeled withfluorescence dyes. In the present nested PCR design, the PCR productsare not labeled with any fluorescence dye. Thus, the ligatiblestructures can be detected only if they are ligated to a labeledoligonucleotide, e.g. one that bears a fluorescent label. To be ligatedonto a ligatible structure created by restriction enzyme digestion andsubsequently detected by a DNA sequencer, an oligonucleotide would havea 3′ OH group and 5′ fluorescence labeling. Current DNA sequencerstypically analyze fluorescence with a spectrum between 510-610 nm. Thoseof skill in the art will recognize that there are many fluorescence dyesavailable in this range. For example, the BODIPY series of fluorescentdyes have been shown to minimize emission overlaps between dyes (Metzker1996). Examples of other dyes which may be utilized to carry out thisaspect of the invention include but are not limited to FAM, fluorescein,R110, R6G, TAMRA, ROX, Texas red, etc.

If only a single locus is of interest is to be analyzed, locus-specificfluorescence-labeled oligos may be utilized. However, in order to carryout high level multiplex reactions, sets of degenerate,fluorescence-labeled oligos are desirable. For example, in order to makeligatible structures suitable for high throughput genotyping assays, acommon set of four fluorescence dye labeled degenerate oligonucleotidesmay be utilized. The degeneration of the oligonucleotides is to makethem reusable for all potential assays. If an oligonucleotide with aspecific sequence is used, it would only work for one particular target.

Several important issues must be taken into consideration regarding thedesign of the degenerate oligonucleotides. One is the length of theoligonucleotide, which effects both ligation efficiency and theconcentration of a specific sequence in the degenerate oligopopulation). For example, for a 6-7 mer oligo containing 5-6 degeneratebases, any given sequence would have a concentration of ¼³-¼⁴ (i.e.1/1024- 1/4096). If the total concentration for a 7-mer degeneratedoligo used in ligation is 1 μM, the effective concentration for anygiving sequence would be greater than 0.2 nM, a concentration at least100 fold above the detection limit of sequencing gel/capillary. Those ofskill in the art will recognize that the optimal length of thedegenerated oligo and its concentration may vary from experiment toexperiment, depending on the level of degeneracy and multiplex used inthe reaction. However, the oligo will generally be in the range of about5- about 20 bps in length with from about 4 to about 19 degeneratebases, and more preferably in the range of about 6 to about 8 bps inlength with about 4 to about 6 degenerate bases.

Another consideration is the melting temperature for the degeneratedoligonucleotide. As is well known to those of skill in the art, the Tmwill vary depending on factors such as the GC content, the length of theoligo, and the like. For example, increasing the GC content and/or thelength of an oligo increases its Tm. Increasing the length ofoligonucleotide will increase T_(m), but would decrease the effectiveconcentration for a specific sequence. In addition, other factors suchas the ionic strength of the ligation buffer also affect T_(m)significantly. Other considerations involved in oligo design which arewell-known to those of skill in the art include the number and positionof bases which are degenerated. The number of degenerated bases willhave a significant impact on the effective concentration and T_(m). Forexample, if 5 degenerated bases are used within a 7 mer, 16 oligos mustbe synthesized, but if 6 degenerated bases are used, only 4 oligos arenecessary. While one specified position must match the polymorphic siteif multiple specified positions are used, the position does notnecessarily have to be the last one on the 3′ end. Further, the positionof the ligation site relative to SNP site (the 5′ end or the 3′ end) mayalso be altered to optimize a given experiment. Previous studies (Tong2000; Tong 1999) have suggested that DNA ligase exhibits better allelicdiscrimination when the ligation site is on the 3′ end.) Those of skillin the art will recognize that many tools exist and are well-known tothe skilled practitioner for optimizing such parameters, therebyfacilitating the design of suitable oligos. (See, for example,Delahuntyl996, Tong 2000; Tong 1999). Examples include but are notlimited to various computer soft ware packages such as DNA Star, Oligo,Primer Express, etc.

Other factors which may be considered include the type and concentrationof ligase to be employed. For example, a 6-7 mer oligo with a moderateGC content should exhibit a melting temperature between 3040° C., a Tmsufficient to perform a cyclic ligation assay. The use of thermal stableDNA ligase would provide an opportunity for multiple cycling of aligation assay at this temperature, which would compensate for loweredefficiency of the reaction.

Those of skill in the art will recognize that a plethora of protocolreferences, kits, etc. are readily available to facilitate carrying outsuch manipulations of the DNA. The precise nature of the protocols whichare used is not a central feature of the present invention, and any ofmany suitable means for carrying out the various steps of PCRamplification and detection and analysis of the amplicons may beutilized in the practice of the present invention.

Those of skill in the art will recognize that a variety of additionalwell-known techniques may be reliably utilized in the practice of thepresent invention, examples of which include but are not limited to:mass spectroscopy, ELISA techniques, fluorescence intensity/energytransfer/polarization, various micro assays (e.g. DNA/RNA arrays),microfluidic devices, and magnetic/color-coded micro beads.

In a preferred embodiment of the present invention, the method ofdetection is capillary electrophoresis. The demand from the human genomeproject greatly stimulated technology development in DNA sequencing and,in return, improvements in DNA sequencing greatly speeded up the humangenome project. As a result, capillary DNA sequencers became widelyavailable to researchers in both academe and industry. Capillary DNAsequencers are highly sensitive and automated detection systems. Theyalso have a relatively broad detection range, covering at least threeorders of magnitude. Their analytical power is better than most signaldetection systems currently used for SNP genotyping. Capillary DNAsequencing systems have been proven to deliver ultra high throughput inmany DNA sequencing centers around the world.

Capillary electrophoresis has been used in DNA polymorphism analysis(Barta, 2001; Medintz 2001). Most of these studies exploited thesensitivity and speed of separation. Lindblad-Toh et al (2000) used aDNA sequencer to increase throughput. In this study, the authorschemically synthesized oligonucleotides from 18-50 bases as extensionprimers and succeeded in scoring 14 SNPs in one lane (The sizedifference between primers was 4 bases, there were a few sets of oligosof the same length but of different sequences that were used to scoretwo different SNPs by different colors). This study essentially provedthat DNA sequencers had the potential for high capacity SNP scoringusing the SBE format, and the length of primers was the limiting factorfor throughput.

The wide availability, high detection sensitivity, high throughputcapacity and automation readiness of capillary DNA sequencers call forhigh capacity applications, and SNP genotyping is one of suchapplications. If these two are united, considerable advances in cost andthroughput would be achieved. Many did not consider DNA sequencers forhigh throughput SNP scoring mainly because it was not cost-effective.DNA sequencers separate DNA fragments by sizes and colors. In order toseparate genotyping products by a sequencer, the products must bedifferent in sizes and or colors. To separate two SBE products, forexample, SBE primers must be different in length or the extended basesmust be different in color. Otherwise, the two products of same lengthand same colors would not be distinguished. While it is reasonable tochemically synthesize primers of 15-40 bases, this size range gives onlylimited multiplex capacity (Lindblad-Toh 2000)). To chemicallysynthesize primers longer than 50 bases is not economical, especially inhigh throughput settings. But a sequencer can separate DNA fragments upto 600-800 bases. In the practice of the present invention as describedabove, SBE products can be arranged to be 5 bases apart for 600-800bases, allowing the loading over 100 SBE products in one lane.Multiplexing at this level makes it reasonable to use sequencers for SNPscoring. Furthermore, practical protocols for using DNA sequencers arewell-established and many handling and procedures involved have beenautomated and are readily available for large scale applications.

The following examples are intended to be illustrative in nature and arenot intended to be limit the scope of the invention in any way.

EXAMPLES Example 1 Formation of Sequencible Structures

Sets of nPCR primers were designed as described to amplify DNA fragmentsfrom six markers of human chromosome 6 and individual nPCR was performedusing Taq-Gold DNA polymerases. Clean-up digestion was carried out usingshrimp alkaline phosphatase and E. coli exonuclease I (37° C. 1 hour)After the clean-up digestion, a high temperature denaturing (95° C. for15 min.) was performed and the samples were allowed to cool to roomtemperature. A sequencing reaction was performed using standardsequencing kits but without sequencing primers. Sequencing reactionswere performed either: 1) under standard cyclic sequencing program: 25cycles of 96° C. for 1 min., 50° C. for 15 sec. and 60° C. for 4 min; or2) using this condition: 25 cycles of 96° C. for 1 min., 80° C. for 4min. The purpose of these experiments was to test whether a sequenciblestructure formed after denaturing and reannealing of the outer and innerPCR products together. If we obtained sequences and the sequencesmatched the inner PCR primer position, this would confirm thatsequencible structures did form as expected.

The results from experiments with two of the six markers are shown inFIGS. 3A and B and FIGS. 4A and B. These figures show the resultsobtained with the second sequencing reaction protocol. FIGS. 3A and 4Ashow PCR primer positions and orientations, and the expected sequences.Note that in FIG. 4, the inner PCR primer is designed in the sameorientation as forward, so the sequence reading from a sequencingreaction would be reverse complementary to the sequences highlightedwith italics. FIGS. 3B and 4B show sequencing reads from the ABI 377sequencer analysis software.

For the 6 markers tested, we obtained sequences for five, and all ofthese sequences matched the expected sequences exactly. Full lengthsequence reads (from the ends of inner PCR products to the ends of outerPCR products) were obtained when the standard cyclic sequencingconditions were used (data not shown). Much shorter sequence readings(30-50bases) were obtained when the extension temperature was changed to80° C. (as in FIGS. 3 and 4). For 4 of the 5 makers that producedsequences the sequence started right after the inner primers, onestarted 3 bases away from the end of the primer. For both sequencingconditions, the starting positions of sequencing were the same.

Example 2 Creating Sequencible Structures with Products fromExtendase-Free Pfu DNA Polymerases that can be Used for SBE GenotypingReactions

The variations of starting position of the obtained sequences(Example 1) suggested that the positioning of inner PCR primers alonemay not be sufficient to obtain sequencible structures with preciseextending ends. An alternative is the use of extendase-free DNApolymerases that do not add the 3′ A base at the ends of PCR products.Thus, experiments were carried out using Pfu DNA polymerases which havebeen reported for lacking the extendase activity (10, 28). Inner PCRprimers were designed so that their products would be positioned rightbefore the polymorphic site if the products formed sequenciblestructures. Inner and outer PCR reactions were performed separatelyusing Pfu DNA polymerases and the products wqere pooled together to formthe sequencible structures. Then a SBE reaction was performed usingSNaPshot kit from Applied BioSystems. SBE reactions were purified andanalyzed by ABI 377 DNA sequencer using GeneScan.

Three samples were shown in FIG. 5, representing three genotypes for themarker: Homozygous A/A (upper panel), heterozygous A/G (middle panel)and homozygous G/G (lower panel). The experiments demonstrated that PfuDNA polymerases produce inner PCR products with precise ends that can beused to form sequencible structures for SBE reactions and to scoregenotypes correctly. In a parallel experiment using Taq DNA polymerasesmultiple products were observed and genotypes were not clear (data notshown). The results clearly show the differences between the two DNApolymerases.

Example 3 Control of the Reading Length for Multiple Target Sequencing

The reading length of sequencing reactions is principally determined bythe ratio of dNTPs/ddNTPs. For typical sequencing applications, longsequence reads are obtained by using a high dNTPs/ddNTPs ratio (100:1)provided in commercial sequencing kits. In the practice of the presentinvention, a short stretch of DNA surrounding the SNP targets will besequenced. For this purpose it is desirable to control the length ofsequence reads so that multiple sequencing reactions can be loaded inone lane/capillary without overlapping each other. Currently there is nocommercially available kit for this purpose, but two products, theSNaPShot from Applied Biosystems and the SNuPe from AmershanBiosciences, have the potential to be modified for this purpose. Boththese kits were designed for SBE reactions, containing only dye-labeledddNTPs. The concentration of nucleotides in the mix is proprietary.

Individually available dye-labeled nucleotides were used to assemble atesting kit to demonstrate the principle that reading length can becontrolled by dNTPs/ddNTPs ratio. FIG. 6 shows an example of thesequencing reaction. In this reaction, dye-labeled acyclo-ddNTPs wereused to assemble the mix. The sequencing reaction was performed usingpGEM plasmid and M13-21 primer. The ratio of dNTPs/ddNTPs and thermalcycling conditions are detailed in the figure legend. The results showeda clean read of 30-40 bases after the primer.

This example demonstrated that the reading length can be effectivelycontrolled by the ratio of dNTPs/dDNTPs. To practice the currentinvention, especially the MulTarSeq, a specially formulated sequencingmix is therefore desirable.

Example 4 Comparison of Two-Step PCR Amplification Using Primary andSecondary Primer Sets

Four SNP markers randomly were chosen from our SNP collection and PCRand nPCR primers were designed. Both outer and inner primers had twodesigns, a simple design and the two domain design. The sequences usedfor the common domains of the two domain were the M13 universal forwardprimer: cccagtcacgacgttgtaaaacg (for forward outer PCR primers) and M13universal reverse primer: agcggataacaatttcacacagg (for reverse outer PCRprimers). For the inner PCR primer the common domain sequence wastcaGCAGCatGTCTCttcca, which contains recognition sites for tworestriction enzymes (Bst71 I and Alw26 I, in capital letters). MultiplexPCR and multiplex nPCR reactions were performed to compare the primerdesigns and protocols. PCR products were then analyzed by agarose gelelectrophoresis. FIG. 7 showed the results. Lane 1 of FIG. 7 representsa 4× multiplex PCR using simple primer design. As ca be seen, only twobands are visible (480,510 bp). They could not be resolved by agarosegel electrophoresis. Lane 2 represents the same 4 amplicons, butgenerated using the two domain design. In this case, the two smallerproducts (400, 430 bp) are clearly seen. Note the size shifted for thelarger products due to extra length (46 bp) in the two domain primers.Lane 3 represents 4× multiplex NPCR with the same amplicons but carriedout by the two-step porcedure. In addition to the 4 outer PCR products,4 inner PCR productscan also be seen.

This experiment demonstrates that the two domain design significantlyimproves both multiplex PCR and multiplex NPCR, and that the two-stepprocedure is especially effective to improve multiplex nPCR.

Example 5 Use of Multiplex nPCR to Create an Ordered Series ofSequencible Structures

This example is provided to illustrate how a series of sequenciblestructures are formed from two multiplexed PCRS. A set of five SNPs areselected and NPCR primers are designed using the two domain design. Inorder to run all products on a single sequencing lane/capillary, theinner PCR products are arranged to span 150 to 400 bps range, 50 bpsapart from one another. The common domain for the inner primers shouldcontain a restriction enzyme recognition site to produce blunt endsafter the amplification. Two 5× mulitiplexing PCRs are performedseparately: one uses only the outer PCR primers the other uses only theinner PCR primers. After the amplification an appropriate restrictionenzyme is used to remove the common tag sequences from the inner end ofthe inner PCR products. The outer products and the inner products thenare combined, purified, denature and reannealed together. A single baseextension reaction is performed using the SNaPshot kit from AppliedBiosystems. The products then are purified and run on DNA sequencer. Theanalysis of the extension products will identify the genotypes of thesample.

While the invention has been described in terms of its preferredembodiments, those skilled in the art will recognize that the inventioncan be practiced with modification within the spirit and scope of theappended claims. Accordingly, the present invention should not belimited to the embodiments as described above, but should furtherinclude all modifications and equivalents thereof within the spirit andscope of the description provided herein.

REFERENCES

-   Armstrong B, Stewart M, Mazumder A. 2000. Suspension arrays for high    throughput, multiplexed single nucleotide polymorphism genotyping.    Cytometry 40:102-8-   Barta C, Ronai Z, Sasvari-Szekely M, Guttman A. 2001. Rapid single    nucleotide polymorphism analysis by primer extension and capillary    electrophoresis using polyvinyl pyrrolidone matrix. Electrophoresis    22:779-82-   Bray M S, Boerwinkle E, Doris P A. 2001. High-throughput multiplex    SNP genotyping with MALDI-TOF mass spectrometry: practice, problems    and promise. Hum. Mutat. 17:296-304-   Cai H, White P S, Tomey D, Deshpande A, Wang Z, Keller R A, Marrone    B, Nolan J P. 2000. Flow cytometry-based minisequencing: a new    platform for high-throughput single-nucleotide polymorphism scoring.    Genomics 66:13543-   Chen X, Levine L, Kwok P Y. 1999. Fluorescence polarization in    homogeneous nucleic acid analysis. Genome Res. 9:492-8-   Chen X, Zehnbauer B, Gnirke A, Kwok P Y. 1997. Fluorescence energy    transfer detection as a homogeneous DNA diagnostic method. Proc.    Natl. Acad. Sci. U.S. A 94:10756-61-   Chen X, Livak K J, Kwok P Y. 1998. A homogeneous, ligase-mediated    DNA diagnostic test. Genome Res. 8:549-56-   Chen X, Kwok P Y. 1997. Template-directed dye-teminator    incorporation (TDI) assay: a homogeneous DNA diagnostic method based    on fluorescence resonance energy transfer. Nucleic Acids Res.    25:347-53-   Cline J, Braman J C, Hogrefe H H. 1996. PCR fidelity of pfu DNA    polymerase and other thermostable DNA polymerases. Nucleic Acids    Res. 24:3546-51-   Coljee V W, Murray H L, Donahue W F, Jarrell K A. 2000. Seamless    gene engineering using RNA- and DNA-overhang cloning. Nat.    Biotechnol. 18:789-91-   Collins F S, Guyer M S, Charkravarti A. 1997. Variations on a theme:    cataloging human DNA sequence variation. Science 278:1580-1-   Delahunty C, Ankener W, Deng Q, Eng J, Nickerson D A. 1996. Testing    the feasibility of DNA typing for human identification by PCR and an    oligonucleotide ligation assay. Am. J. Hum. Genet. 58:1239-46-   Dubertret B, Calame M, Libchaber A J. 2001. Single-mismatch    detection using gold-quenched fluorescent oligonucleotides. Nat.    Biotechnol. 19:365-70-   Fan J B, Chen X, Halushka M K, Berno A, Huang X, Ryder T, Lipshutz R    J, Lockhart D J, Chakravarti A. 2000. Parallel genotyping of human    SNPs using generic high-density oligonucleotide tag arrays. Genome    Res. 10:853-60-   Ferguson J A, Steemers F J, Walt D R. 2000. High-density fiber-optic    DNA random microsphere array. Anal. Chem. 72:5618-24-   Fors L, Lieder K W, Vavra S H, Kwiatkowski R W. 2000. Large-scale    SNP scoring from unamplified genomic DNA. Pharmacogenomics. 1    :219-29-   Gal J, Schnell R, Szekeres S, Kalman M. 1999. Directional cloning of    native PCR products with preformed sticky ends (autosticky PCR).    Mol. Gen. Genet. 260:569-73-   Gal J, Schnell R, Kalman M. 2000. Polymerase dependence of    autosticky polymerase chain reaction. Anal. Biochem. 282:156-8-   Germer S, Holland M J, Higuchi R. 2000. High-throughput SNP    allele-frequency determination in pooled DNA samples by kinetic PCR.    Genome Res. 10:258-66-   Germer S, Higuchi R. 1999. Single-tube genotyping without    oligonucleotide probes. Genome Res. 9:72-8-   Gerry N P, Witowski N E, Day J, Hammer R P, Barany G,    Barany F. 1999. Universal DNA microarray method for multiplex    detection of low abundance point mutations. J. Mol. Biol. 292:    251-62-   Gilles P N, Wu D J, Foster C B, Dillon P J, Chanock S J. 1999.    Single nucleotide polymorphic discrimination by an electronic dot    blot assay on semiconductor microchips. Nat. Biotechnol. 17: 365-70-   Hu G. 1993. DNA polymerase-catalyzed addition of nontemplated extra    nucleotides to the 3′ end of a DNA fragment. DNA Cell Biol.    12:763-70-   Iannone M A, Taylor J D, Chen J, Li M S, Rivers P, Slentz-Kesler K    A, Weiner M P. 2000. Multiplexed single nucleotide polymorphism    genotyping by oligonucleotide ligation and flow cytometry. Cytometry    39:131-40-   Kruglyak L. 1999. Prospects for whole-genome linkage disequilibrium    mapping of common disease genes. Nat. Genet. 22: 139-44-   Kwok P Y. 2000. High-throughput genotyping assay approaches.    Pharmacogenomics. 1:95-100-   Lindblad-Toh K, Winchester E, Daly M J, Wang D G, Hirschhorn J N,    Laviolette J P, Ardlie K, Reich D E, Robinson E, Sklar P, Shah N,    Thomas D, Fan J B, Gingeras T, Warrington J, Patil N, Hudson T J,    Lander E S. 2000. Large-scale discovery and genotyping of    single-nucleotide polymorphisms in the mouse. Nat. Genet. 24:381-6-   Lindroos K, Liljedahl U, Raitio M, Syvanen AC. 2001. Minisequencing    on oligonucleotide microarrays: comparison of immobilisation    chemistries. Nucleic Acids Res. 29:E69-   Long A D, Langley C H. 1999. The power of association studies to    detect the contribution of candidate genetic loci to variation in    complex traits. Genome Res. 9:720-31-   Marras S A, Kramer F R, Tyagi S. 1999. Multiplex detection of    single-nucleotide variations using molecular beacons. Genet. Anal.    14:151-6-   Medintz I, Wong W W, Berti L, Shiow L, Tom J, Scherer J, Sensabaugh    G, Mathies R A. 2001. High-performance multiplex SNP analysis of    three hemochromatosis-related mutations with capillary array    electrophoresis microplates. Genome Res. 11:413-21-   Metzker M L, Lu J, Gibbs R A. Electrophoretically uniform    fluorescent dyes for automated DNA sequencing. Science. 1996;    271(5254): 1420-2.-   Myakishev M V, Khripin Y, Hu S, Hamer D H. 2001. High-throughput SNP    genotyping by allele-specific PCR with universal    energy-transfer-labeled primers. Genome Res. 11:163-9-   Nikiforov T T, Rendle R B, Goelet P, Rogers Y H, Kotewicz M L,    Anderson S, Trainor G L, Knapp MR. 1994. Genetic Bit Analysis: a    solid phase method for typing single nucleotide polymorphisms.    Nucleic Acids Res. 22:4167-75-   Nordstrom T, Nourizad K, Ronaghi M, Nyren P. 2000. Method enabling    pyrosequencing on double-stranded DNA. Anal. Biochem. 282 :186-93-   Pastinen T, Raitio M, Lindroos K, Tainola P, Peltonen L, Syvanen    A C. 2000. A system for specific, high-throughput genotyping by    allele-specific primer extension on microarrays. Genome Res.    10:1031-42-   Pastinen T, Kurg A, Metspalu A, Peltonen L, Syvanen A C. 1997.    Minisequencing: a specific tool for DNA analysis and diagnostics on    oligonucleotide arrays. Genome Res. 7:606-14-   Pastinen T, Partanen J, Syvanen A C. 1996. Multiplex, fluorescent,    solid-phase minisequencing for efficient screening of DNA sequence    variation. Clin. Chem. 42:1391-7-   Perler F B, Kumar S, Kong H. 1996. Thermostable DNA polymerases.    Adv. Protein Chem. 48:377-435-   Prince J A, Feuk L, Howell W M, Jobs M, Emahazion T, Blennow K,    Brookes A J. 2001. Robust and accurate single nucleotide    polymorphism genotyping by dynamic allele-specific hybridization    (DASH): design criteria and assay validation. Genome Res. 11:152-62-   Riley J H, Allan C J, Lai E, Roses A. 2000. The use of single    nucleotide polymorphisms in the isolation of common disease genes.    Pharmacogenomics. 1:39-47-   Ronaghi M. 2001. Pyrosequencing sheds light on DNA sequencing.    Genome Res. 11:3-11-   Ross P, Hall L, Smirnov I, Haff L. 1998. High level multiplex    genotyping by MALDI-TOF mass spectrometry. Nat. Biotechnol. 16:    1347-51-   Sachidanandam R, Weissman D, Schmidt S C, Kakol J M, Stein L D,    Marth G, Sherry S, Mullikin J C, Mortimore B J, Willey D L, Hunt S    E, Cole C G, Coggill P C, Rice C M, Ning Z, Rogers J, Bentley D R,    Kwok P Y, Mardis E R, Yeh R T, Schultz B, Cook L, Davenport R, Dante    M, Fulton L, Hillier L, Waterston R H, McPherson J D, Gilman B,    Schaffner S, Van Etten W J, Reich D, Higgins J, Daly M J,    Blumenstiel B, Baldwin J, Stange-Thomann N, Zody M C, Linton L,    Lander E S, Atshuler D. 2001. A map of human genome sequence    variation containing 1.42 million single nucleotide polymorphisms.    Nature 409:928-33-   Sauer S, Lechner D, Berlin K, Plancon C, Heuermann A, Lehrach H, Gut    I G. 2000. Full flexibility genotyping of single nucleotide    polymorphisms by the GOOD assay. Nucleic Acids Res. 28:E100-   Shi M M. 2001. Enabling large-scale pharmacogenetic studies by    high-throughput mutation detection and genotyping technologies.    Clin. Chem. 47: 164-72-   Shuber A P. 1999. U.S. Pat. No. 5,882,856-   Stump M D, Cherry J L, Weiss R B. 1999. The use of modified primers    to eliminate cycle sequencing artifacts. Nucleic Acids Res. 27:    4642-8-   Syvanen A C. 1999. From gels to chips: “minisequencing” primer    extension for analysis of point mutations and single nucleotide    polymorphisms. Hum. Mutat. 13: 1-10-   Tong J, Cao W, Barany F. 1999. Biochemical properties of a high    fidelity DNA ligase from Thermus species AK16D. Nucleic Acids Res.    27: 788-94-   Tong J, Barany F, Cao W. 2000. Ligation reaction specificities of an    NAD(+)-dependent DNA ligase from the hyperthermophile Aquifex    aeolicus. Nucleic Acids Res. 28: 1447-54-   Woolley A T, Guillemette C, Li C C, Housman D E, Lieber C M. 2000.    Direct haplotyping of kilobase-size DNA using carbon nanotube    probes. Nat. Biotechnol. 18:760-3-   Ye F, Li M S, Taylor J D, Nguyen Q, Colton H M, Casey W M, Wagner M,    Weiner M P, Chen J. 2001. Fluorescent microsphere-based readout    technology for multiplexed human single nucleotide polymorphism    analysis and bacterial identification. Hum. Mutat. 17:305-16

1. A method of producing hybrid DNA with a single strand overhang thatincludes a target site, comprising the steps of: obtaining a firstprimer which hybridizes to a 5′ strand of a strand of deoxyribonucleicacid (DNA), a second primer which hybridizes to a 3′ strand of saidstrand of DNA, and a third primer which hydridizes to said 3′ strand ofDNA; producing by nested polymerase chain reaction (PCR) using saidfirst primer, said second primer, and said third primer, an outeramplicon which includes said target site and an inner amplicon whichexcludes said target site; and forming at least one of a ligatablestructure which includes a 3′-5′ sequence which excludes said targetsite hybridized to a 5′-3′ sequence which includes said target site, anda sequencible structure which includes a 5′-3′ sequence which excludessaid target site hybridized to a 3′-5′ sequence which includes saidtarget site.
 2. The method of claim 1 wherein said forming step formsboth said ligatible structure and said sequencible structure.
 3. Themethod of claim 1 wherein said target site includes at least one singlenucleotide polymorphism.
 4. A method of genotyping DNA of an individualby analyzing at least one target site in said DNA, comprising the stepsof: obtaining a first primer which hybridizes to a 5′ strand of a strandof said DNA, a second primer which hybridizes to a 3′ strand of saidstrand of DNA, and a third primer which hydridizes to said 3′ strand ofDNA; producing by nested polymerase chain reaction (PCR) using saidfirst primer, said second primer, and said third primer, an outeramplicon which includes said target site and an inner amplicon whichexcludes said target site; and forming at least one of a sequenciblestructure which includes a 5′-3′ sequence which excludes said targetsite hybridized to a 3′-5′ sequence which includes said target site; anda ligatable structure which includes a 3′-5′ sequence which excludessaid target site hybridized to a 5′-3′ sequence which includes saidtarget site; and analyzing at least one of a sequencing product formedby sequencing said sequencible structure and a ligation product formedby ligating said ligatible structure with a labeled oligonucleotide witha DNA sequencer to determine the genotype of said individual.
 5. Themethod of claim 4 wherein said forming step forms both said ligatiblestructure and said sequencible structure.
 6. The method of claim 4wherein said at least one target site includes a single nucleotidepolymorphism.
 7. The method of claim 4 wherein said sequenciblestructure is sequenced by a technique selected from the group consistingof dideoxy sequencing, pyrosequencing, and single base extension.
 8. Themethod of claim 4 wherein a plurality of target sites are simultaneouslyanalyzed by the step of producing an ordered series of sequencingproducts of varying, non-overlapping length each specific for aparticular target site, and wherein said step of analyzing is carriedout by electrophoresing said ordered series of sequencing products in asingle channel of said DNA sequencer.
 9. The method of claim 8 whereinsaid step of producing an ordered series of sequencing products iscarried out by single base extension (SBE).
 10. The method of claim 8wherein said step of producing an ordered series of sequencing productsis carried out by a dideoxy sequencing reaction utilizing a low ratio ofdNTPs to ddNTPs.
 11. The method of claim 4 wherein said labeledoligonucleotide is fluorescently labeled.
 12. The method of claim 4wherein a plurality of ligatable structures are formed in said formingstep, each being specific for one of a plurality of target sites, andsaid ligation products are of varying, non-overlapping lengths with saidlabeled oligonucleotides being degenerate, said step of analyzing iscarried out by electrophoresing a plurality of ligation products in asingle channel of said DNA sequencer.
 13. A method for analyzing atleast one target site in a DNA molecule, comprising the steps ofamplifying by nested PCR said target site, wherein said nested PCR iscarried out using inner and outer PCR primer pairs, wherein said outerPCR primer pair forms a first PCR product which contains said targetsite, and wherein said inner PCR primer pair forms a second PCR productwhich contains a portion of said first PCR product but does not containsaid target site, denaturing said first and said second PCR products toform ssDNA sequences, reannealing said ss DNA sequences to form asequencible hybrid DNA molecule and a ligatible hybrid DNA molecule,performing sequencing reactions with said sequencible hybrid DNAmolecule and ligation reactions with said ligatible hybrid DNA molecule,and determining the characteristics of said target site by analyzingresults obtained in said performing step.
 14. The method of claim 13wherein said at least one target site is an SNP polymorphism site. 15.The method of claim 13 wherein said inner and outer primer pairscomprise a sequence tag.
 16. The method of claim 15 wherein saidsequence tag comprises a restriction enzyme recognition site.
 17. Themethod of claim 13 wherein said step of amplifying is carried out usinga low concentration of primers, and further comprising a second step ofamplifying, wherein said second step of amplifying uses secondaryprimers for amplification of said sequence tags.
 18. The method of claim13 wherein one target site is analyzed.
 19. The method of claim 13wherein a plurality of target sites are analyzed.
 20. The method ofclaim 13 wherein said step of amplifying is carried out in a singlemultiplex PCR reaction.
 21. The method of claim 13 wherein said step ofamplifying is carried out in multiple independent PCR reactions.
 22. Themethod of claim 13 wherein said results obtained in said performing stepare analyzed by a DNA sequencer.
 23. Inner and outer PCR primer pairsfor the amplification of a target site in a DNA molecule, wherein saidouter PCR primer pair forms a first PCR product which contains saidtarget site, and wherein said inner PCR primer pair forms a second PCRproduct which contains a portion of said first PCR product but does notcontain said target site.
 24. The primer pairs of claim 23, wherein saidtarget site is an SNP polymorphism site.
 25. The primer pairs of claim23, wherein said primer pairs comprise a sequence tag.
 26. A kit foramplification of at least one target site in a DNA molecule, comprisinginner and outer PCR primer pairs for the amplification of said targetsite, wherein said outer primer pair amplifies a portion of said DNAmolecule including said target site, and wherein said inner primer pairamplifies part of said portion of said DNA molecule amplified by saidouter primer pair and excludes said target site.
 27. The kit of claim 26wherein said inner and outer PCR primer pairs comprise sequence tags.28. The kit of claim 27, further comprising secondary primers for theamplification of said sequence tags.
 29. A dideoxy DNA sequencing kitfor producing short chain termination fragments, comprising dNTPs andddNTPs present in a dNTP:ddNTP ratio ranging from 1:0 to 1:10.